Using Inverse Lexical Rules to Acquire a Wide-coverage Lexicalized Grammar

نویسندگان

  • Hiroko Nakanishi
  • Yusuke Miyao
  • Jun’ichi Tsujii
چکیده

Automatic grammar extraction from annotated corpora (Xia, 1999; Chen and Vijay-Shanker, 2000; Chiang, 2000; Hockenmaier and Steedman, 2002; Miyao et al., 2004) enabled us to build a widecoverage lexicalized grammar at low cost. They succeeded in extracting a large number of lexical entries with less effort while conventional methods only allow limited lexical entries to be acquired in realworld texts. Lexicalized grammars require many lexical entries to explain various syntactic alternations, and we can hardly expect that all words will appear in all possible syntactic alternations within a limited training corpus. We aimed at improving the coverage of an automatically extracted grammar using lexical rules in this work (Jackendoff, 1975; Pollard and Sag, 1994). The idea behind lexical rules is that the syntactic constraints of a group of words are derived with general rules from their lexemes, which express characteristics common to the group (e.g. “runs” or “running” is derived from the lexeme “run”). We automatically acquired lexemes by applying lexical rules inversely to the lexical entries of the HPSG grammar extracted automatically from the Penn Treebank (Marcus et al., 1993). We could then generate a wide set of lexical entries from the lexemes, and our grammar achieved a higher coverage against real-world texts. Although the lexical rules proposed by Pollard and Sag (1994) treated several parts-ofspeech such as nouns or adjectives, we only formulated rules for verbs. This is because verbs PHON “scold” HEAD verb CAT MODL null MODR null SUBJ < HEAD noun > COMPS < HEAD noun > NONLOCAL SLASH <>

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deriving Information Structure from Prosodically Marked Text with Lexicalized Tree Adjoining Grammars

This paper proposes a method for integrating intonation and information structure into the Lexicalized Tree Adjoining Grammar (LTAG) formalism. The method works fully within LTAG and requires no changes or additions to the basic formalism. From the existing CCG analysis, we denote boundary tones as lexical items and pitch accents as features of lexical items. We then show how prosodically marke...

متن کامل

Encoding Lexicalized Tree Adjoining Grammars with a Nonmonotonic Inheritance Hierachy

This paper shows how DATR, a widely used formal language for lexical knowledge representation , can be used to define an I_TAG lexicon as an inheritance hierarchy with internal lexical rules. A bottom-up featu-ral encoding is used for LTAG trees and this allows lexical rules to be implemented as covariation constraints within feature structures. Such an approach eliminates the considerable redu...

متن کامل

Alpino: Wide-coverage Computational Analysis of Dutch

Alpino is a wide-coverage computational analyzer of Dutch which aims at accurate, full, parsing of unrestricted text. We describe the head-driven lexicalized grammar and the lexical component, which has been derived from existing resources. The grammar produces dependency structures, thus providing a reasonably abstract and theory-neutral level of linguistic representation. An important aspect ...

متن کامل

Automating the Generation of a Wide-coverage LFG for French using a MetaGrammar

In this paper, we explain how the notion of MetaGrammar, which has successfully been used for generating wide-coverage tree adjoining grammars (TAGs) for various languages such as French (Abeillé et al. (1999)) and German (Gerdes (2002)), may be used to generate a wide-coverage Lexical Functional Grammar (LFG) for French. We first introduce the notion of MetaGrammar and present the tools we use...

متن کامل

Ambiguity Resolution for Machine Translation of Telegraphic Messages

Telegraphic messages with numerous instances of omission pose a new challenge to parsing in that a sentence with omission causes a higher degree of ambi6uity than a sentence without omission. Misparsing reduced by omissions has a far-reaching consequence in machine translation. Namely, a misparse of the input often leads to a translation into the target language which has incoherent meaning in ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004